Neighbor joining phylogenetic tree featuring 150 human gut-derived commensal isolates constructed from genomic ANI distances. Root mean squared error of relative abundance (RA) estimates aligned for each species.
False positive species level taxa reported from MetaPhlAn4 using default stat_q (0.20) with the 150-member bacterial synthetic community. Species ordered by decreasing average RA for species with RA over 0.1%.
Reported mean relative abundance of C. difficile by each classifier for representative C. difficile isolates at varying levels of spike in C. difficile reads (1,000 to 50,000). Error bars report the standard error from the mean. Dotted line represents the target relative abundance.
Reported RA of C. difficile for representative isolates of C. difficile (n = 30) across various stat_q parameters (0.0-0.20). Black dashed line represents the target RA for 1,000 C. difficile reads (0.0133%).
Reported RA of C. difficile for representative isolates of C. difficile (n = 30) between global stat_q adjustment (0.05) and local stat_q adjustment (0.05). Black dashed line represents the target RA for 1,000 C. difficile reads (0.0133%).
Mean RA of ST-11 strains (pink) compared to the mean RA of the remaining 28 strains (white) for 1,000-50,000 reads.
Reported RA of C. difficile for 30 isolates of C. difficile across various global stat_q parameters (0.0-0.20). Black line represents the target RA of C. difficile for 1,000 to 50,000 reads.
Binary classification performance of the synthetic metagenomic samples (n=10) between default stat_q (0.20), global stat_q (0.05), and local stat_q (0.05) measured for the purity of the taxonomic profile (S2C) and completeness of the taxonomic profile (S2D). (S2E) L1 norm error, the total error between the true and predicted abundances of each species, of the synthetic metagenomic samples (n=10) between default stat_q (0.20), global stat_q (0.05), and local stat_q (0.05).
Pairwise Bray-Curtis Dissimilarity of the overall between stat_q adjustment (0.05) and default (0.20).
Heatmap of mean marker gene (n=200) coverage of marker genes mapped to 30 CDC curated C. difficile isolates using BowTie2 (top) and heatmap of pairwise genomic ANI distances (bottom). Left annotation displays inclusion into 3 sets of marker genes that are non-uniformly represented across the representative genomes.
Neighbor joining phylogenetic tree of C. difficile isolates derived from patient fecal samples (n=73). The tree was constructed from pairwise genomic ANI distances. The detection of C. difficile from the fecal metagenomes by MetaPhlAn4 at stat_q of 0.20 decorates the tree and the sequence typing of each isolate was determined using MLST.
Neighbor joining phylogenetic tree featuring 30 C. difficile isolate genomes, derived from the CDC EIP, constructed from genomic ANI distances.
Detection of C. difficile by PCR tests for C. difficile 16S rRNA gene, tcdB, and by the Cobas C. difficile Test used for patient diagnosis (n=73). Diversity of each isolate is characterized using MLST. The detection of C. difficile using MetaPhlAn4 is compared between default (0.20) and local stat_q adjustment (0.05). The read counts for each metagenomic sample are binned every 4 million reads and capped at \(10^7\) reads. The quantified number of C. difficile 16S copies per mg of fecal sample are binned into high (\(>= 10^5\)) or low (\(<10^5\)).
Number of isolates by ST from the C. difficile culture collection.
Detection of C. difficile from the metagenomes of the patient-derived fecal samples between between local stat_q adjustment (0.05) and default (0.20).
Pairwise Bray-Curtis Dissimilarity of the overall between local stat_q adjustment (0.05) and default (0.20).
Genus-level taxonomy (top), detection of C. difficile 16S with PCR (middle), and MetaPhlAn4 estimates of C. difficile RA (bottom) of HD (n=21)and Cx positive samples (n=73) using default MetaPhlAn4 (0.20).
Number of reported species of HD and Cx positive fecal samples across compared MetaPhlan4 stat_q settings.
Alpha diversity measured by Inverse simpson of HD and Cx positive fecal samples across compared MetaPhlan4 stat_q settings.
Maximum-likelihood optimized StrainPhlAn4 phylogenetic tree output of CDC C. difficile isolate spike-ins into patient stool metagenomic samples (n=5). Genomes were separated into three sets as previously described: set 1 (green), set 2 (red), and set 3 (blue). For each spike-in, the C. difficile genome were randomly sampled at 30,000 (1x) or 60,000 (2x) reads while the reference genomes (gray) were sampled at 300,000 reads (10x) using optimized settings.
Maximum likelihood optimized StrainPhlAn4 phylogenetic tree output of identical ST1 C. difficile strains spiked into patient stool metagenomic samples (n=5). For each spike-in, the C. difficile genome were randomly sampled at 30,000 reads while the reference genomes was sampled at 300,000 reads.
Pairwise branch length distances between in silico samples and the reference genome (bars) compared with the species-level Inverse Simpson metric of the metagenomic background (points).
Maximum likelihood optimized StrainPhlAn4 phylogenetic tree output of 5 different C. difficile STs (pink) spiked into patient stool metagenomic samples (n=5). For each spike-in, the C. difficile genome was randomly sampled at 30,000 reads while the reference genomes were sampled at 300,000 reads.
Maximum likelihood optimized StrainPhlAn4 phylogenetic tree output of 5 different ST1 isolates spiked into patient stool metagenomic samples (n=5). For each spike-in, the C. difficile genome was randomly sampled at 30,000 reads while the reference genomes were sampled at 300,000 reads.
Number of correctly detected iterations (n=10) for each CDC C.difficile isolate using default StrainPhlan4 settings
Number of correctly detected iterations (n=10) for each CDC C.difficile isolate across various StrainPhlan4 settings
Maximum-likelihood phylogenetic tree StrainPhlAn4 output of patient-derived C. difficile isolate spike-ins (red) into patient stool metagenomic samples (n=5). For each spike-in, the C. difficile genome were randomly sampled at 30,000 (1x) or 60,000 (2x) reads while the isolated culture genomes (blue) and the reference tree (gray) were sampled at 300,000 reads (10x) using optimized settings.
Improved detection C. difficile from patient fecal metagenomic samples using optimized StrainPhlAn4 compared to default (n=73).
Boxplots reporting the estimated C. difficile reads in samples detected as C. difficile positive by only MetaPhlAn4 or by MetaPhlAn4 and StrainPhlAn4.
Heat map of pairwise branch length distances between patient metagenomic samples compared to cultured isolate genomes.
Heatmap of genomic ANI distances between C. difficile cultured isolate genomes.